A Variable Selection Criterion for Linear Discriminant Rule and its Optimality in High Dimensional Setting
نویسندگان
چکیده
In this paper, we suggest the new variable selection procedure, called MEC, for linear discriminant rule in the high-dimensional setup. MEC is derived as a second-order unbiased estimator of the misclassification error probability of the linear discriminant rule. It is shown that MEC not only decomposes into ‘fitting’ and ‘penalty’ terms like AIC and Mallows Cp, but also possesses an asymptotic optimality in the sense that MEC achieves the smallest possible conditional probability of misclassification in candidate variable sets. Through simulation studies, it is shown that MEC has good performances in the sense of selecting the true variable sets.
منابع مشابه
Feature Selection in High-Dimensional Classification
High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established...
متن کاملAsymptotic optimality of a cross-validatory predictive approach to linear model selection
Abstract: In this article we study the asymptotic predictive optimality of a model selection criterion based on the cross-validatory predictive density, already available in the literature. For a dependent variable and associated explanatory variables, we consider a class of linear models as approximations to the true regression function. One selects a model among these using the criterion unde...
متن کاملA linear constrained distance-based discriminant analysis for hyperspectral image classification
Fisher's linear discriminant analysis (LDA) is a widely used technique for pattern classi"cation problems. It employs Fisher's ratio, ratio of between-class scatter matrix to within-class scatter matrix to derive a set of feature vectors by which high-dimensional data can be projected onto a low-dimensional feature space in the sense of maximizing class separability. This paper presents a linea...
متن کاملAsymptotic optimality of sparse linear discriminant analysis with arbitrary number of classes
Many sparse linear discriminant analysis (LDA) methods have been proposed to overcome the major problems of the classic LDA in high-dimensional settings. However, the asymptotic optimality results are limited to the case that there are only two classes, which is due to the fact that the classification boundary of LDA is a hyperplane and explicit formulas exist for the classification error in th...
متن کاملThe subselect R package
The subselect package addresses the issue of variable selection in different statistical contexts, among which exploratory data analyses; univariate or multivariate linear models; generalized linear models; principal components analysis; linear discriminant analysis, canonical correlation analysis. Selecting variable subsets requires the definition of a numerical criterion which measures the qu...
متن کامل